To date, most studies applying modern Artificial Intelligence (AI) to the interpretation of Flow Cytometry data in the field of Hematopathology are proof of concept demonstrations (Seifert et al, 2023;Ng et al, 2024). They aim to define some procedure to process data, and then use this data to train a variety of machine learning models or decision trees to identify patterns. Ultimately, these patterns are used to diagnose individual cases within a small subset of hematologic malignancies. However, for AI to have meaningful clinical utility, two objectives must be achieved: demonstration of reproducibility of findings and randomized controlled trials. This would permit AI models to be treated like any other clinical lab test. To this end, no randomized control studies have yet been published. Beyond simple retrospective analysis, the results of only one observational study have been published demonstrating the power of using clinical AI to diagnose Chronic Lymphocytic Leukemia (CLL) by flow cytometry (Simonson et al, 2022). Most published models utilize apriori data transformation techniques to reduce the dimensionality of their cohort of cases in an effort to boost performance.

We hypothesize that the tabular nature of flow cytometry data as it exists in standard FCS listmode files does not receive any benefit from apriori data transformation techniques, and thus we aimed to replicate the only proof of concept study published without such elements.

In a student-led project, the de-identified data of the CLL patient cohort used in the Kang et al study was obtained, along with their model's python source code. Their patient cohort included 116 cases total, including 53 control cases labeled “normal,” as well as 44 cases of CLL and 19 cases of MBCLL both labeled “CLL.” The study methods were independently replicated on both a MacBook pro laptop running macOS (f1 accuracy score of 0.82), and PC laptop running Linux (f1 accuracy score of 0.87) with comparable results to those published for their XGBoost model (f1 accuracy score of 0.88). Furthermore, the practice of efficient code writing bolstered this reproducibility, and subsequent modification afforded improved accuracy for performance of neural-network-based models (AUC of 0.83 compared to published result of 0.51). Finally, repeating this study with the same XGBoost model, but trained with data from a cohort of 168 CLL patients and 479 control patients from a different institution also demonstrated the generalizability of its application (f1 accuracy score of 0.84).

In pursuit of the above, we identified a number of challenges facing the creation and deployability of a federated paradigm for using a clinical AI Flow Cytometry tool. Namely, transformation and representation of the data matters. In a federated model, these techniques must be uniform throughout development and deployment. Moreover, variability between institutional procedures for immunofluorescent staining, panel design and flow cytometry workflow create ambiguity in model training and serve as a significant sources of model overfitting and potential barriers in the development of a federated model.

This student-led study demonstrates that AI tools in practice need not be far removed from clinicians in their diagnostic workflow. With minimal resources, accurate models can be developed and tuned for a myriad of applications. As such, we call on more precision laboratories at the major academic institutions to establish AI focused clinical Flow Cytometry research agendas.

Disclosures

No relevant conflicts of interest to declare.

This content is only available as a PDF.
Sign in via your Institution